Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29.691
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38605639

RESUMO

The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.


Assuntos
Disciplinas das Ciências Biológicas , Reconhecimento Automatizado de Padrão , Algoritmos , Aprendizado de Máquina , Semântica
2.
Cereb Cortex ; 34(4)2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38610084

RESUMO

The application of wearable magnetoencephalography using optically-pumped magnetometers has drawn extensive attention in the field of neuroscience. Electroencephalogram system can cover the whole head and reflect the overall activity of a large number of neurons. The efficacy of optically-pumped magnetometer in detecting event-related components can be validated through electroencephalogram results. Multivariate pattern analysis is capable of tracking the evolution of neurocognitive processes over time. In this paper, we adopted a classical Chinese semantic congruity paradigm and separately collected electroencephalogram and optically-pumped magnetometer signals. Then, we verified the consistency of optically-pumped magnetometer and electroencephalogram in detecting N400 using mutual information index. Multivariate pattern analysis revealed the difference in decoding performance of these two modalities, which can be further validated by dynamic/stable coding analysis on the temporal generalization matrix. The results from searchlight analysis provided a neural basis for this dissimilarity at the magnetoencephalography source level and the electroencephalogram sensor level. This study opens a new avenue for investigating the brain's coding patterns using wearable magnetoencephalography and reveals the differences in sensitivity between the two modalities in reflecting neuron representation patterns.


Assuntos
Eletroencefalografia , Magnetoencefalografia , Feminino , Masculino , Humanos , Semântica , Potenciais Evocados , Análise Multivariada , China
3.
Zhongguo Zhong Yao Za Zhi ; 49(3): 596-606, 2024 Feb.
Artigo em Chinês | MEDLINE | ID: mdl-38621863

RESUMO

This study aims to optimize the prediction model of personalized water pills that has been established by our research group. Dioscoreae Rhizoma, Leonuri Herba, Codonopsis Radix, Armeniacae Semen Amarum, and calcined Oyster were selected as model medicines of powdery, fibrous, sugary, oily, and brittle materials, respectively. The model prescriptions were obtained by uniform mixing design. With hydroxypropyl methylcellulose E5(HPMC-E5) aqueous solution as the adhesive, personalized water pills were prepared by extrusion and spheronizaition. The evaluation indexes in the pill preparation process and the multi-model statistical analysis were employed to optimize and evaluate the prediction model of personalized water pills. The prediction equation of the adhesive concentration was obtained as follows: Y_1=-4.172+3.63X_A+15.057X_B+1.838X_C-0.997X_D(adhesive concentration of 10% when Y_1<0, and 20% when Y_1>0). The overall accuracy of the prediction model for adhesive concentration was 96.0%. The prediction equation of adhesive dosage was Y_2=6.051+94.944X_A~(1.5)+161.977X_B+70.078X_C~2+12.016X_D~(0.3)+27.493X_E~(0.3)-2.168X_F~(-1)(R~2=0.954, P<0.001). Furthermore, the semantic prediction model for material classification of traditional Chinese medicines was used to classify the materials contained in the prescription, and thus the prediction model of personalized water pills was evaluated. The results showed that the prescriptions for model evaluation can be prepared with one-time molding, and the forming quality was better than that established by the research group earlier. This study has achieved the optimization of the prediction model of personalized water pills.


Assuntos
Medicamentos de Ervas Chinesas , Medicina Tradicional Chinesa , Água , Semântica , Prescrições
4.
Zhongguo Zhong Yao Za Zhi ; 49(3): 587-595, 2024 Feb.
Artigo em Chinês | MEDLINE | ID: mdl-38621862

RESUMO

A method for material classification of traditional Chinese medicines based on the physical properties of powder has been established by our research group. This method involves pre-treatment of traditional Chinese medicine decoction pieces, powder preparation, and determination of physical properties, being cumbersome. In this study, the word segmentation logic of semantic analysis was adopted to establish the thesaurus and local standardized semantic word segmentation database with the macroscopic and microscopic characteristics of 36 model traditional Chinese medicines as the basic data. The physical properties of these medicines have been determined and the classification of these medicines is clear in the cluster analysis. A total of 55 keywords for powdery, fibrous, sugary, oily, and brittle materials were screened by association rules and the set inclusion and exclusion criteria, and the weights of the keywords were calculated. Furthermore, the algorithms of the keyword matching scores and the computation rules of the single or multiple material classification were established for building the intelligent model of semantic analysis for the material classification. The semantic classification results of the other 35 TCMs except Pseudostellariae Radix(multi-material medicine) agreed with the clustering results based on the physical properties of the powder, with an agreement rate of 97.22%. In model validation, the prediction results of semantic classification of traditional Chinese medicines were consistent with the clustering results based on the physical properties of powder, with an agreement rate of 83.33%. The results showed that the method of material classification based on semantic analysis was feasible, which laid a foundation for the development of intelligent decision-making technology for personalized traditional Chinese medicine preparations.


Assuntos
Medicamentos de Ervas Chinesas , Medicina Tradicional Chinesa , Pós , Semântica , Raízes de Plantas
5.
Sci Rep ; 14(1): 7697, 2024 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-38565624

RESUMO

The rapid increase in biomedical publications necessitates efficient systems to automatically handle Biomedical Named Entity Recognition (BioNER) tasks in unstructured text. However, accurately detecting biomedical entities is quite challenging due to the complexity of their names and the frequent use of abbreviations. In this paper, we propose BioBBC, a deep learning (DL) model that utilizes multi-feature embeddings and is constructed based on the BERT-BiLSTM-CRF to address the BioNER task. BioBBC consists of three main layers; an embedding layer, a Long Short-Term Memory (Bi-LSTM) layer, and a Conditional Random Fields (CRF) layer. BioBBC takes sentences from the biomedical domain as input and identifies the biomedical entities mentioned within the text. The embedding layer generates enriched contextual representation vectors of the input by learning the text through four types of embeddings: part-of-speech tags (POS tags) embedding, char-level embedding, BERT embedding, and data-specific embedding. The BiLSTM layer produces additional syntactic and semantic feature representations. Finally, the CRF layer identifies the best possible tag sequence for the input sentence. Our model is well-constructed and well-optimized for detecting different types of biomedical entities. Based on experimental results, our model outperformed state-of-the-art (SOTA) models with significant improvements based on six benchmark BioNER datasets.


Assuntos
Idioma , Semântica , Processamento de Linguagem Natural , Benchmarking , Fala
6.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38557678

RESUMO

Disease ontologies facilitate the semantic organization and representation of domain-specific knowledge. In the case of prostate cancer (PCa), large volumes of research results and clinical data have been accumulated and needed to be standardized for sharing and translational researches. A formal representation of PCa-associated knowledge will be essential to the diverse data standardization, data sharing and the future knowledge graph extraction, deep phenotyping and explainable artificial intelligence developing. In this study, we constructed an updated PCa ontology (PCAO2) based on the ontology development life cycle. An online information retrieval system was designed to ensure the usability of the ontology. The PCAO2 with a subclass-based taxonomic hierarchy covers the major biomedical concepts for PCa-associated genotypic, phenotypic and lifestyle data. The current version of the PCAO2 contains 633 concepts organized under three biomedical viewpoints, namely, epidemiology, diagnosis and treatment. These concepts are enriched by the addition of definition, synonym, relationship and reference. For the precision diagnosis and treatment, the PCa-associated genes and lifestyles are integrated in the viewpoint of epidemiological aspects of PCa. PCAO2 provides a standardized and systematized semantic framework for studying large amounts of heterogeneous PCa data and knowledge, which can be further, edited and enriched by the scientific community. The PCAO2 is freely available at https://bioportal.bioontology.org/ontologies/PCAO, http://pcaontology.net/ and http://pcaontology.net/mobile/.


Assuntos
Ontologias Biológicas , Neoplasias da Próstata , Humanos , Masculino , Inteligência Artificial , Semântica , Neoplasias da Próstata/genética
7.
PLoS One ; 19(4): e0296874, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38564586

RESUMO

One of the main theoretical distinctions between reading models is how and when they predict semantic processing occurs. Some models assume semantic activation occurs after word-form is retrieved. Other models assume there is no-word form, and that what people think of as word-form is actually just semantics. These models thus predict semantic effects should occur early in reading. Results showing words with inconsistent spelling-sound correspondences are faster to read aloud if they are imageable/concrete compared to if they are abstract have been used as evidence supporting this prediction, although null-effects have also been reported. To investigate this, I used Monte-Carlo simulation to create a large set of simulated experiments from RTs taken from different databases. The results showed significant main effects of concreteness and spelling-sound consistency, as well as age-of-acquisition, a variable that can potentially confound the results. Alternatively, simulations showing a significant interaction between spelling-sound consistency and concreteness did not occur above chance, even without controlling for age-of-acquisition. These results support models that use lexical form. In addition, they suggest significant interactions from previous experiments may have occurred due to idiosyncratic items affecting the results and random noise causing the occasional statistical error.


Assuntos
Leitura , Semântica , Humanos , Idioma
8.
Nat Commun ; 15(1): 2880, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38570504

RESUMO

Deciphering the relationship between a gene and its genomic context is fundamental to understanding and engineering biological systems. Machine learning has shown promise in learning latent relationships underlying the sequence-structure-function paradigm from massive protein sequence datasets. However, to date, limited attempts have been made in extending this continuum to include higher order genomic context information. Evolutionary processes dictate the specificity of genomic contexts in which a gene is found across phylogenetic distances, and these emergent genomic patterns can be leveraged to uncover functional relationships between gene products. Here, we train a genomic language model (gLM) on millions of metagenomic scaffolds to learn the latent functional and regulatory relationships between genes. gLM learns contextualized protein embeddings that capture the genomic context as well as the protein sequence itself, and encode biologically meaningful and functionally relevant information (e.g. enzymatic function, taxonomy). Our analysis of the attention patterns demonstrates that gLM is learning co-regulated functional modules (i.e. operons). Our findings illustrate that gLM's unsupervised deep learning of the metagenomic corpus is an effective and promising approach to encode functional semantics and regulatory syntax of genes in their genomic contexts and uncover complex relationships between genes in a genomic region.


Assuntos
Aprendizado de Máquina , Semântica , Filogenia , Óperon , Proteínas , Metagenômica
9.
J Speech Lang Hear Res ; 67(4): 1229-1242, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38563688

RESUMO

PURPOSE: Almost 40 years after its development, in this article, we reexamine the relevance and validity of the ubiquitously used Revised Speech Perception in Noise (R-SPiN) sentence corpus. The R-SPiN corpus includes "high-context" and "low-context" sentences and has been widely used in the field of hearing research to examine the benefit derived from semantic context across English-speaking listeners, but research investigating age differences has yielded somewhat inconsistent findings. We assess the appropriateness of the corpus for use today in different English-language cultures (i.e., British and American) as well as for older and younger adults. METHOD: Two hundred forty participants, including older (60-80 years) and younger (19-31 years) adult groups in the the United Kingdom and United States, completed a cloze task consisting of R-SPiN sentences with the final word removed. Cloze, as a measure of predictability, and entropy, as a measure of response uncertainty, were compared between culture and age groups. RESULTS: Most critically, of the 200 "high-context" stimuli, only around half were assessed as highly predictable for older adults (United Kingdom: 109; United States: 107); and fewer still, for younger adults (United Kingdom: 75; United States: 81). We also found dominant responses to these "high-context" stimuli varied between cultures, with U.S. responses being more likely to match the original R-SPiN target. CONCLUSIONS: Our findings highlight the issue of incomplete transferability of corpus items across English-language cultures as well as diminished equivalency for older and younger adults. By identifying relevant items for each population, this work could facilitate the interpretation of inconsistent findings in the literature, particularly relating to age effects.


Assuntos
Percepção da Fala , Humanos , Idoso , Ruído , Audição/fisiologia , Idioma , Semântica
10.
Laryngorhinootologie ; 103(4): 252-260, 2024 Apr.
Artigo em Alemão | MEDLINE | ID: mdl-38565108

RESUMO

Language processing can be measured objectively using late components in the evoked brain potential. The most established component in this area of research is the N400 component, a negativity that peaks at about 400 ms after stimulus onset with a centro-parietal maximum. It reflects semantic processing. Its presence, as well as its temporal and quantitative expression, allows to conclude about the quality of processing. It is therefore suitable for measuring speech comprehension in special populations, such as cochlear implant (CI) users. The following is an overview of the use of the N400 component as a tool for studying language processes in CI users. We present studies with adult CI users, where the N400 reflects the quality of speech comprehension with the new hearing device and we present studies with children where the emergence of the N400 component reflects the acquisition of their very first vocabulary.


Assuntos
Implantes Cocleares , Percepção da Fala , Adulto , Criança , Feminino , Humanos , Masculino , Compreensão/fisiologia , Eletroencefalografia , Potenciais Evocados/fisiologia , Idioma , Desenvolvimento da Linguagem , Semântica , Percepção da Fala/fisiologia
11.
IEEE J Biomed Health Inform ; 28(4): 2294-2303, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38598367

RESUMO

Medicine package recommendation aims to assist doctors in clinical decision-making by recommending appropriate packages of medicines for patients. Current methods model this task as a multi-label classification or sequence generation problem, focusing on learning relationships between individual medicines and other medical entities. However, these approaches uniformly overlook the interactions between medicine packages and other medical entities, potentially resulting in a lack of completeness in recommended medicine packages. Furthermore, medicine commonsense knowledge considered by current methods is notably limited, making it challenging to delve into the decision-making processes of doctors. To solve these problems, we propose DIAGNN, a Dual-level Interaction Aware heterogeneous Graph Neural Network for medicine package recommendation. Specifically, DIAGNN explicitly models interactions of medical entities within electronic health records(EHRs) at two levels, individual medicine and medicine package, leveraging a heterogeneous graph. A dual-level interaction aware graph convolutional network is utilized to capture semantic information in the medical heterogeneous graph. Additionally, we incorporate medication indications into the medical heterogeneous graph as medicine commonsense knowledge. Extensive experimental results on real-world datasets validate the effectiveness of the proposed method.


Assuntos
Tomada de Decisão Clínica , Registros Eletrônicos de Saúde , Humanos , Conhecimento , Redes Neurais de Computação , Semântica
12.
BMJ Health Care Inform ; 31(1)2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38642920

RESUMO

OBJECTIVES: Incident reporting systems are widely used to identify risks and enable organisational learning. Free-text descriptions contain important information about factors associated with incidents. This study aimed to develop error scores by extracting information about the presence of error factors in incidents using an original decision-making model that partly relies on natural language processing techniques. METHODS: We retrospectively analysed free-text data from reports of incidents between January 2012 and December 2022 from Nagoya University Hospital, Japan. The sample data were randomly allocated to equal-sized training and validation datasets. We conducted morphological analysis on free text to segment terms from sentences in the training dataset. We calculated error scores for terms, individual reports and reports from staff groups according to report volume size and compared these with conventional classifications by patient safety experts. We also calculated accuracy, recall, precision and F-score values from the proposed 'report error score'. RESULTS: Overall, 114 013 reports were included. We calculated 36 131 'term error scores' from the 57 006 reports in the training dataset. There was a significant difference in error scores between reports of incidents categorised by experts as arising from errors (p<0.001, d=0.73 (large)) and other incidents. The accuracy, recall, precision and F-score values were 0.8, 0.82, 0.85 and 0.84, respectively. Group error scores were positively associated with expert ratings (correlation coefficient, 0.66; 95% CI 0.54 to 0.75, p<0.001) for all departments. CONCLUSION: Our error scoring system could provide insights to improve patient safety using aggregated incident report data.


Assuntos
Gestão de Riscos , Semântica , Humanos , Estudos Retrospectivos , Gestão de Riscos/métodos , Segurança do Paciente , Hospitais Universitários
13.
PLoS One ; 19(4): e0299746, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38635575

RESUMO

In this exploratory study, we investigate the influence of several semantic-pragmatic and syntactic factors on prosodic prominence production in German, namely referential and lexical newness/givenness, grammatical role, and position of a referential target word within a sentence. Especially in terms of the probabilistic distribution of accent status (nuclear, prenuclear, deaccentuation) we find evidence for an additive influence of the discourse-related and syntactic cues, with lexical newness and initial sentence position showing the strongest boosting effects on a target word's prosodic prominence. The relative strength of the initial position is found in nearly all prosodic factors investigated, both discrete (such as the choice of accent type) and gradient (e.g., scaling of the Tonal Center of Gravity and intensity). Nevertheless, the differentiation of prominence relations is information-structurally less important in the beginning of an utterance than near the end: The prominence of the final object relative to the surrounding elements, especially the verbal component, is decisive for the interpretation of the sentence. Thus, it seems that a speaker adjusts locally important prominence relations (object vs. verb in sentence-final position) in addition to a more global, rhythmically determined distribution of prosodic prominences across an utterance.


Assuntos
Semântica , Percepção da Fala , Sinais (Psicologia) , Idioma
14.
Sci Rep ; 14(1): 8924, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38637613

RESUMO

Accurate measurement of abdominal aortic aneurysm is essential for selecting suitable stent-grafts to avoid complications of endovascular aneurysm repair. However, the conventional image-based measurements are inaccurate and time-consuming. We introduce the automated workflow including semantic segmentation with active learning (AL) and measurement using an application programming interface of computer-aided design. 300 patients underwent CT scans, and semantic segmentation for aorta, thrombus, calcification, and vessels was performed in 60-300 cases with AL across five stages using UNETR, SwinUNETR, and nnU-Net consisted of 2D, 3D U-Net, 2D-3D U-Net ensemble, and cascaded 3D U-Net. 7 clinical landmarks were automatically measured for 96 patients. In AL stage 5, 3D U-Net achieved the highest dice similarity coefficient (DSC) with statistically significant differences (p < 0.01) except from the 2D-3D U-Net ensemble and cascade 3D U-Net. SwinUNETR excelled in 95% Hausdorff distance (HD95) with significant differences (p < 0.01) except from UNETR and 3D U-Net. DSC of aorta and calcification were saturated at stage 1 and 4, whereas thrombus and vessels were continuously improved at stage 5. The segmentation time between the manual and AL-corrected segmentation using the best model (3D U-Net) was reduced to 9.51 ± 1.02, 2.09 ± 1.06, 1.07 ± 1.10, and 1.07 ± 0.97 min for the aorta, thrombus, calcification, and vessels, respectively (p < 0.001). All measurement and tortuosity ratio measured - 1.71 ± 6.53 mm and - 0.15 ± 0.25. We developed an automated workflow with semantic segmentation and measurement, demonstrating its efficiency compared to conventional methods.


Assuntos
Aneurisma da Aorta Abdominal , Implante de Prótese Vascular , Calcinose , Procedimentos Endovasculares , Trombose , Humanos , Aneurisma da Aorta Abdominal/diagnóstico por imagem , Aprendizagem Baseada em Problemas , Semântica , Tomografia Computadorizada por Raios X , Processamento de Imagem Assistida por Computador
15.
J Acoust Soc Am ; 155(4): 2687-2697, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38639927

RESUMO

One speech sound can be associated with multiple meanings through iconicity, indexicality, and/or systematicity. It was not until recently that this "pluripotentiality" of sound symbolism attracted serious attention, and it remains uninvestigated how pluripotentiality may arise. In the current study, Japanese, Korean, Mandarin, and English speakers rated unfamiliar jewel names on three semantic scales: size, brightness, and hardness. The results showed language-specific and cross-linguistically shared pluripotential sound symbolism. Japanese speakers associated voiced stops with large and dark jewels, whereas Mandarin speakers associated [i] with small and bright jewels. Japanese, Mandarin, and English speakers also associated lip rounding with darkness and softness. These sound-symbolic meanings are unlikely to be obtained through metaphorical or metonymical extension, nor are they reported to colexify. Notably, in a purely semantic network without the mediation of lip rounding, softness can instead be associated with brightness, as illustrated by synesthetic metaphors such as yawaraka-na hizashi /jawaɾakanaçizaɕi/ "a gentle (lit. soft) sunshine" in Japanese. These findings suggest that the semantic networks of sound symbolism may not coincide with those of metaphor or metonymy. The current study summarizes the findings in the form of (phono)semantic maps to facilitate cross-linguistic comparisons of pluripotential sound symbolism.


Assuntos
Idioma , Web Semântica , Simbolismo , Semântica , Fonética
16.
BMC Bioinformatics ; 25(1): 152, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627652

RESUMO

BACKGROUND: Text summarization is a challenging problem in Natural Language Processing, which involves condensing the content of textual documents without losing their overall meaning and information content, In the domain of bio-medical research, summaries are critical for efficient data analysis and information retrieval. While several bio-medical text summarizers exist in the literature, they often miss out on an essential text aspect: text semantics. RESULTS: This paper proposes a novel extractive summarizer that preserves text semantics by utilizing bio-semantic models. We evaluate our approach using ROUGE on a standard dataset and compare it with three state-of-the-art summarizers. Our results show that our approach outperforms existing summarizers. CONCLUSION: The usage of semantics can improve summarizer performance and lead to better summaries. Our summarizer has the potential to aid in efficient data analysis and information retrieval in the field of biomedical research.


Assuntos
Algoritmos , Pesquisa Biomédica , Semântica , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural
17.
Nat Commun ; 15(1): 2981, 2024 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-38582783

RESUMO

Encoding- and retrieval-related neural activity jointly determine mnemonic success. We ask whether electroencephalographic activity can reliably predict encoding and retrieval success on individual trials. Each of 98 participants performed a delayed recall task on 576 lists across 24 experimental sessions. Logistic regression classifiers trained on spectral features measured immediately preceding spoken recall of individual words successfully predict whether or not those words belonged to the target list. Classifiers trained on features measured during word encoding also reliably predict whether those words will be subsequently recalled and further predict the temporal and semantic organization of the recalled items. These findings link neural variability predictive of successful memory with item-to-context binding, a key cognitive process thought to underlie episodic memory function.


Assuntos
Eletroencefalografia , Memória Episódica , Humanos , Rememoração Mental , Semântica
18.
PLoS One ; 19(4): e0299490, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38635650

RESUMO

Researchers commonly perform sentiment analysis on large collections of short texts like tweets, Reddit posts or newspaper headlines that are all focused on a specific topic, theme or event. Usually, general-purpose sentiment analysis methods are used. These perform well on average but miss the variation in meaning that happens across different contexts, for example, the word "active" has a very different intention and valence in the phrase "active lifestyle" versus "active volcano". This work presents a new approach, CIDER (Context Informed Dictionary and sEmantic Reasoner), which performs context-sensitive linguistic analysis, where the valence of sentiment-laden terms is inferred from the whole corpus before being used to score the individual texts. In this paper, we detail the CIDER algorithm and demonstrate that it outperforms state-of-the-art generalist unsupervised sentiment analysis techniques on a large collection of tweets about the weather. CIDER is also applicable to alternative (non-sentiment) linguistic scales. A case study on gender in the UK is presented, with the identification of highly gendered and sentiment-laden days. We have made our implementation of CIDER available as a Python package: https://pypi.org/project/ciderpolarity/.


Assuntos
Mídias Sociais , Identidade de Gênero , Semântica , Análise de Sentimentos , Algoritmos
19.
Nat Commun ; 15(1): 2848, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38565531

RESUMO

Spatial transcriptomics has revolutionized the study of gene expression within tissues, while preserving spatial context. However, annotating spatial spots' biological identity remains a challenge. To tackle this, we introduce Pianno, a Bayesian framework automating structural semantics annotation based on marker genes. Comprehensive evaluations underscore Pianno's remarkable prowess in precisely annotating a wide array of spatial semantics, ranging from diverse anatomical structures to intricate tumor microenvironments, as well as in estimating cell type distributions, across data generated from various spatial transcriptomics platforms. Furthermore, Pianno, in conjunction with clustering approaches, uncovers a region- and species-specific excitatory neuron subtype in the deep layer 3 of the human neocortex, shedding light on cellular evolution in the human neocortex. Overall, Pianno equips researchers with a robust and efficient tool for annotating diverse biological structures, offering new perspectives on spatial transcriptomics data.


Assuntos
Perfilação da Expressão Gênica , Semântica , Humanos , Teorema de Bayes , Transcriptoma
20.
PLoS One ; 19(4): e0300767, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38578733

RESUMO

Semantic segmentation of cityscapes via deep learning is an essential and game-changing research topic that offers a more nuanced comprehension of urban landscapes. Deep learning techniques tackle urban complexity and diversity, which unlocks a broad range of applications. These include urban planning, transportation management, autonomous driving, and smart city efforts. Through rich context and insights, semantic segmentation helps decision-makers and stakeholders make educated decisions for sustainable and effective urban development. This study investigates an in-depth exploration of cityscape image segmentation using the U-Net deep learning model. The proposed U-Net architecture comprises an encoder and decoder structure. The encoder uses convolutional layers and down sampling to extract hierarchical information from input images. Each down sample step reduces spatial dimensions, and increases feature depth, aiding context acquisition. Batch normalization and dropout layers stabilize models and prevent overfitting during encoding. The decoder reconstructs higher-resolution feature maps using "UpSampling2D" layers. Through extensive experimentation and evaluation of the Cityscapes dataset, this study demonstrates the effectiveness of the U-Net model in achieving state-of-the-art results in image segmentation. The results clearly shown that, the proposed model has high accuracy, mean IOU and mean DICE compared to existing models.


Assuntos
Aprendizado Profundo , Semântica , Planejamento de Cidades , Pesquisa Empírica , Hidrolases , Processamento de Imagem Assistida por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...